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System, Method and Computer Program Product for a 
Programmable Pixel Processing Model with Instruction 

Set 

5 Related Applications 

The present application is a continuation-in-part of an application filed 
05/31/2000 under serial number 09/586,249, and an application filed 12/06/1999 under 
serial number 09/454,516, now issued as U.S. Pat. No.: 6,198,488. 

10 Field of the Invention 

The present invention relates to computer graphics, and more particularly to 
providing programmability in a computer graphics processing pipeline. 

Background of the Invention 
15 Graphics application program interfaces (API's) have been instrumental in 

allowing applications to be written to a standard interface and to be run on multiple 
platforms, i.e. operating systems. Examples of such graphics API's include Open 
Graphics Library (OpenGL®) and Direct 3D™ (D3D™) pipelines. OpenGL® is the 
computer industry's standard graphics API for defining 2-D and 3-D graphic images. 
20 With OpenGL®, an application can create the same effects in any operating system 

using any OpenGL®-adhering graphics adapter. OpenGL® specifies a set of commands 
or immediately executed functions. Each command directs a drawing action or causes 
special effects. 

25 Thus, in any computer system which supports this OpenGL® standard, the 

operating system(s) and application software programs can make calls according to the 
standard, without knowing exactly any specifics regarding the hardware configuration 
of the system. This is accomplished by providing a complete library of low-level 
graphics manipulation commands, which can be used to implement graphics operations. 

30 
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A significant benefit is afforded by providing a predefined set of commands in 
graphics API's such as OpenGL®. By restricting the allowable operations, such 
commands can be highly optimized in the driver and hardware implementing the 
graphics API. On the other hand, one major drawback of this approach is that changes 
to the graphics API are difficult and slow to be implemented. It may take years for a 
new feature to be broadly adopted across multiple vendors. 

With the integration of transform operations into high speed graphics chips and 
the higher integration levels allowed by semiconductor manufacturing, it is now 
possible to make part of the pipeline accessible to the application writer. There is thus 
a need to exploit this trend in order to afford increased flexibility in visual effects. In 
particular, there is a need to provide a new computer graphics programming model and 
instruction set that allows convenient implementation of changes to the graphics API, 
while preserving the driver and hardware optimization afforded by currently established 
graphics API's. 
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Disclosure of the Invention 



A system, method and computer program product are provided for 
programmable pixel processing in a computer graphics pipeline. Initially, pixel data is 
5 received from a source buffer. Thereafter, programmable operations are performed on 
the pixel data in order to generate output. The operations are programmable in that a 
user may utilize instructions from a predetermined instruction set for generating the 
same. Such output is stored in a register. 

10 In one embodiment of the present invention, the output stored in the register 

may be used in performing the programmable operations on the data. Further, the pixel 
data may include a position, a pixel diffuse color, a specular color, a fog value, and/or a 
plurality of texture coordinates. 

15 In still another embodiment of the present invention, an operation may be 

performed involving the output. Such operation may include a scissor operation, a 
color format conversion, an alpha test operation, a z-buffer/stencil operation, a blend 
operation, a logic operation, a dither operation, and/or a writemask operation. 

20 In yet another embodiment of the present invention, additional standard 

operations may be performed utilizing a standard graphics application program 
interface (API). For example, the API may include at least one of OpenGL® and 
D3D™. 

25 As an option, the pixel data may be negated and/or swizzled prior to performing 

the programmable operations thereon. Further, the programmable operations may 
include a texture fetch operation. Such texture fetch operation may involve a slope. 

In still yet another embodiment, the programmable operations may support 
30 multiple levels of precision. Such levels of precision may include full floating point, 
half floating point, and fixed point. Further, the programmable operations may be 



RM.Qrw.in- /wn ncnm^-xo-. i > 



WO 02/103633 



4 



PCT/US02/19504 



capable of converting the pixel data from a first level of precision to a second level of 
precision for packing the pixel data into a destination, performing calculations, or any 
other purpose. Optionally, the programmable operations may be capable of clamping 
the pixel data for packing the pixel data into a destination. The programmable 
operations may also be capable of removing, or "killing, 95 the pixel data. 

The instruction set of programmable operations may include a no operation, 
texture fetch, move, derivative, multiply, addition, multiply and addition, reciprocal, 
reciprocal square root, three component dot product, four component dot product, 
distance vector, minimum, maximum, pack, unpack, set on less than, set on greater or 
equal than, floor, fraction, kill pixel, exponential base two (2), logarithm base two (2), 
and light coefficients. 

By this design, the present invention allows a user to program a portion of the 
graphics pipeline that handles pixel processing. This results in an increased flexibility 
in generating visual effects. Further, the programmable pixel processing of the present 
invention allows remaining portions of the graphics pipeline, i.e. primitive processing, 
to be controlled by a standard graphics application program interface (API) for the 
purpose of preserving hardware optimizations. 

These and other advantages of the present invention will become apparent upon 
reading the following detailed description and studying the various figures of the drawings. 
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Brief Description of the Drawings 

The foregoing and other aspects and advantages are better understood from the 
5 following detailed description of a preferred embodiment of the invention with 
reference to the drawings, in which: 

Figure 1 is a schematic diagram illustrating a graphics pipeline in accordance 
with one embodiment of the present invention; 

10 

Figure 2 illustrates the overall operation of the various components of the 
graphics pipeline of Figure 1; 

Figure 3 is a schematic diagram illustrating an exemplary model of the pixel 
1 5 processing module in accordance with one embodiment of the present invention; 

Figure 4 is a flowchart illustrating the method by which the programming model 
of Figure 3 carries out programmable pixel processing in the computer graphics 
pipeline; 

20 

Figure 5 is a detailed table showing various attributes handled by the pixel 
source buffer; and 

Figure 6 illustrates an instruction set of programmable operations that may be 
25 carried out by one embodiment of the present invention. 



30 
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Description of the Preferred Embodiments 

Figure 1 is a schematic diagram illustrating a graphics pipeline in accordance 
with one embodiment of the present invention. As shown, the present embodiment 
5 involves a plurality of modules including an attribute buffer 50, a transform module 52, 
a lighting module 54, a rasterization module 56 with a set-up module 57, and a pixel 
processing module 58. 

As an option, each of the foregoing modules may be situated on a single 
10 semiconductor platform. In the present description, the single semiconductor platform 
may refer to a sole unitary semiconductor-based integrated circuit or chip. It should be 
noted that the term single semiconductor platform may also refer to multi-chip modules 
with increased connectivity which simulate on-chip operation, and make substantial 
improvements over utilizing a conventional CPU and bus implementation. Of course, 
15 the present invention may also be implemented on multiple semiconductor platforms 
and/or utilizing a conventional CPU and bus implementation. 

During operation, the buffer 50 is included for gathering and maintaining a 
plurality of attributes. Completed vertices are processed by the transform module 52 

20 and then sent to the lighting module 54. The transform module 52 generates parameters 
for the lighting module 54 to light. The output of the lighting module 54 is screen 
space data suitable for the set-up module which, in turn, sets up primitives. Thereafter, 
rasterization module 56 carries out rasterization of the primitives. In particular, the 
rasterization module 56 passes on pixel data including, but not limited to a position, a 

25 pixel diffuse color, a specular color, a fog value, a plurality of texture coordinates, 

and/or any other information relating to the pixels involved with the processing in the 
graphics pipeline. 

A pixel processing module 58 is coupled to the rasterization module 56 for 
30 processing the pixel data. The pixel processing module 58 begins by reading the pixel 
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data generated by the rasterization module 56. In operation, the pixel processing 
module 58 outputs a color and a depth value. 

Table 1 illustrates operations that may be done after the pixel processing module 
5 58 is finished. A standard application program interface (API) state may be used as 
appropriate, as will soon become apparent. 

Table 1 

Scissor 

Color Format Conversion 
Alpha Test 
Zbuf f er/Stencil 
Blendf unction 
Log i cop 
Dither 
Writemask 

Figure 2 illustrates a high level operation 200 of the pixel processing module 58 
20 of Figure 1. As shown, it is constantly determined in decision 202 whether current 

operation invokes a programmable pixel model of the present invention. If so, a mode 
is enabled that partially supercedes the pixel processing of the standard graphics API, 
thus providing increased flexibility in generating visual effects. See operation 204. 

25 When disabled, the present invention allows increased or exclusive control of 

the graphics pipeline by the standard graphics API, as indicated in operation 206. In 
one embodiment, states of the standard graphics API may not be overruled by invoking 
the programmable pixel mode of the present invention. In one embodiment, no 
graphics API state may be directly accessible by the present invention, with the 

30 exception of the bound texture state. 

In one embodiment, the standard graphics API may include Open Graphics 
Library (OpenGL®) and/or D3D™ APIs. OpenGL® is the computer industry's standard 
API for defining 2-D and 3-D graphic images. With OpenGL®, an application can 
35 create the same effects in any operating system using any OpenGL®-adhering graphics 
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adapter. OpenGL specifies a set of commands or immediately executed functions. 
Each command directs a drawing action or causes special effects. OpenGL® and 
D3D™ APIs are commonly known to those of ordinary skill, and more information on 
the same may be had by reference to the OpenGL® specification Version 2.1, which is 
5 incorporated herein by reference in its entirety. 

As is well known, OpenGL® mandates a certain set of configurable 
computations defining transformation, texture coordinate generation and 
transformation, and lighting. Several extensions have been developed to provide 
10 further computations to OpenGL®. 



Figure 3 is a schematic diagram illustrating an exemplary model 300 of the pixel 
processing module 58 in accordance with one embodiment of the present invention. 
Such programming model 300 may be adapted to work with hardware accelerators of 
15 various configuration and/or with central processing unit (CPU) processing. 

As shown in Figure 3, the pixel processing module 58 includes a functional 
module 302 that is capable of carrying out a plurality of different types of operations. 
The functional module 302 is equipped with three inputs and an output. Associated 
20 with each of the three inputs are a swizzling module 304 and a negating module 306 for 
purposes that will be set forth hereinafter in greater detail. Data swizzling is useful 
when generating vectors. Such technique allows the efficient generation of a vector 
cross product and other vectors. 



25 The functional module 302 is capable of carrying out programmable operations 

and supporting multiple levels of precision. Such levels of precision may include full 
floating point (i.e. 32-bit), half floating point (i.e. 16-bit), and fixed point. More 
information regarding the programmable operations and the various levels of precision 
will be set forth hereinafter in greater detail. 

30 
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Coupled to the output of the functional module 302 is an input of a register file 
308 having three outputs. The register file 308 is also equipped with a vector 
component writemask module 309. The register file 308 has single write and triple 
read access. The contents of the register file 308 are initialized to (0,0,0,0) at the start 
5 of program execution. 

Also included are a pixel source buffer 312 and a constant source buffer 314. 
The pixel source buffer 312 stores data in the form of pixel data, and maybe equipped 
with write access and/or at least single read access. The constant source buffer 314 
10 stores data in the fonn of constant data, and may also be equipped with write access 
and/or at least single read access. It may be read using an absolute address. 

In one exemplary embodiment, the pixel source buffer 312 is twelve (12) quad- 
floats in size (12*128 bits). Operation of the pixel processor module 58 maybe 

1 5 commenced when all pixel attributes are valid. The position contains x and y in integer 
(D3D™) and +0.5 (OpenGL®) window coordinates, z is normalized to the range (0,1), 
and 1/w is in homogeneous clip space. Such attributes may be mandatory in the current 
exemplary embodiment. The pixel attributes may also be perspective correct. The 
colors and fog value may be generated at a lower precision, while the texture 

20 coordinates may be generated in high precision, i.e. 32-bit floating point. Figure 5 is a 
detailed table 500 showing various attributes handled by the pixel source buffer 312. 

Each of the inputs of the functional module 302 is equipped with a multiplexer 
316. This allows the outputs of the register file 308, pixel source buffer 312, and 
25 constant source buffer 314 to be fed to the inputs of the functional module 302. This is 
facilitated by buses 318. 

While not shown, the functional module 302 may also be coupled to a texture 
fetch module (not shown) for fetching texture data. Such texture fetch module may also 
30 be coupled to the register file 308. It should be noted that frame buffer contents are 
only visible to the pixel processing module 58 via texture fetches. 
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• There need not necessarily be an explicit connection between texture 
coordinates and the textures that they may access. It is possible to use the same 
coordinate, or generated coordinates, to access any of the active textures as many times 
5 as desired and in any sequence desired. Programs are allowed access to sixteen (16) 
active textures. If an accessed texture is not bound, the texture fetch may return 
(0,0,0,0). The texture fetch instruction specifies the texture identifier desired (i.e. 
between 0 and 15). In one embodiment, texture components that are in fixed point 
form may have a bias (0.0,-0.5) and a multiply operation (2x,lx) applied to them before 
1 0 they are returned to the pixel processing module 58. This capability need not 

necessarily apply to floating point texture components. A texture fetch may return the 
data at the destination precision. 

The pixel processing module 58 of Figure 3 works well with hardware 
15 accelerators. In use, pixels are processed independently. Only one pixel is visible to the 
pixel processing module 58. As an option, there maybe one 4-bit condition code 
register initialized as equal to 0 at program start. 

Figure 4 is a flowchart illustrating the method 400 by which the model of Figure 
20 3 carries out programmable pixel processing in the computer graphics pipeline. 

Initially, in operation 402, data is received from a pixel source buffer 312. Such data 
may include any type of information that is involved during the processing of pixels in 
the computer graphics pipeline. Further, the pixel source buffer 312 may include any 
type of memory capable of storing data. 

25 

Thereafter, in operation 404, programmable operations, i.e. pixel processing 
102, are performed on the data in order to generate output. The programmable 
operations are capable of generating output that may be stored in the register file 308 in 
operation 406. During operation 408, the output stored in the register file 308 is used in 
30 performing the programmable operations on the data. Thus, the register file 308 may 
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include any type of memory capable of allowing the execution of the programmable 
operations on the output. 

By this design, the present invention allows a user to program a portion of the 
5 graphics pipeline that handles pixel processing. This results in an increased flexibility 
in generating visual effects. Further, the programmable pixel processing of the present 
invention allows remaining portions of the graphics pipeline to be controlled by the 
standard API for the purpose of preserving hardware optimizations. 

1 0 During operation, only one pixel is processed at a time in the functional module 

302 that performs the programmable operations. As such, the pixels may be processed 
independently. Further, the various foregoing operations may be processed for multiple 
pixels in parallel. 

15 In one embodiment of the present invention, a constant may be received, and the 

programmable operations may be performed based on the constant. During operation, 
the constant may be stored in and received from the constant source buffer 314. 
Further, the constant may be accessed in the constant source buffer 314 using an 
absolute or relative address. As an option, there may be one or more address registers 

20 for use during reads from the constant source buffer 314. It may be initialized to "0" at 
the start of program execution in operation 204 of Figure 2. Further, the constant 
source buffer 314 may be written with a program which may or may not be exposed to 
users. 

25 The register file 308 may be equipped with single write and triple read access. 

Register contents may be initialized to (0,0,0,0) at the start of program execution in 
operation 204 of Figure 2. 

Figure 6 illustrates an instruction set of programmable operations 600 that may 
30 be carried out by the present invention, in accordance with one embodiment. As shown 
in Figure 6, such programmable operations 600 include a no operation, texture fetch, 
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move, derivative, multiply, addition, multiply and addition, reciprocal, reciprocal 
square root, three component dot product, four component dot product, distance vector, 
minimum, maximum, pack, unpack, set on less than, set on greater or equal than, floor, 
fraction, kill pixel, exponential base two (2), logarithm base two (2), and light 
5 coefficients. 

An exemplary assembly language will now be set forth in the context of which 
the foregoing operations may be executed. Such language refers to a plurality of 
resources delineated in Table 2. Note the correspondence with the various components 
10 of the model 300 of Figure 3. 



Table 2 



Pixel Source - p[*] of size 12 vectors 
15 (192B) 

Constant Memory - c [*] of size 32 vectors 
(512B) 

Data Registers/Output - R0-R7 , H0-H15 , 10 -17 of size 8,16,8 vectors 
(128B) 

20 Condition Codes - RC,HC,IC of size 4 bits 

Instruction Storage of size 12 8 instructions 

The data registers and memory locations include four component floating point 
precision. Further, the registers maybe accessed as full floating point precision 
25 (fp32:R0-R7), half floating point precision (fpl 6:H0-H1 5), or signed 12-bit fixed point 
precision (sl2:I0-I7). These overlap as follows: R0/H0-H1/I0-I1, R1/H2-H3/I2-I3, 
R2/H4-H5/I4-I5, etc. 



Vector components may be swizzled before use via four subscripts (xyzw). An 
30 arbitrary component re-mapping may be done. Some examples are shown in Table 3 . 



Table 3 



•xyzw means source (x, y, z , w) -> input (x, y, z , w) 
35 . zzxy means source (x, y, z , w) -> input (z , z,x,y) 

.xxxx means source (x, y, z , w) -> input (x, x, x, x) 
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Shortcuts: no subscripts refers to .xyzw (same as writemask) 

.x is the same as . xxxx 
.y is the same as .yyyy 
. z is the same as . zzzz 
5 .wis the same as . wwww 

All source operands (except condition codes) may be negated by putting a *-* 
sign in front. The condition codes can be changed whenever data is written (by adding 
a 'c ? to the op-code) and sharing the writemask with the destination. If there is no other 
1 0 destination, RC or HC or IC may be used as a dummy write register. When data is 

written, each component may compared to 0.0 and its status recorded if the writemask 
for that component is enabled. 

The condition codes are sourced as EQ(equal), NE(not equal), LT(less), 
1 5 GE(greater or equal), LE(less or equal), GT(greater), FL(false), and TR(true), which 
generates four (4) bits of condition code by applying the specified comparison. As a 
source (for KDL and writemask modification), the condition codes may be swizzled. 

Writes to the register, condition codes, and RC are maskable. Each component 
20 is written only if it appears as a destination subscript (from xyzw). Specifying no 
writemask is the same as a writemask of xyzw. No swizzling may be possible for 
writemask, and subscripts may be ordered (x before y before z before w). It is also 
possible to modify the write mask by the condition codes (at the beginning of the 
instruction) by an 'AND 5 operation as set forth in Table 4. It should be noted that 
25 condition codes here have swizzle control. 

Table 4 

destination (GT.x) //writemask [4] - 1111 & GT.xxxx 

30 destination. xw(EQ.yyzz) //writemask [4] = xOOw & EQ.yyzz 

An exemplary assembler format is set forth in Table 5. 

Table 5 

35 
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OPCODE DESTINATION, SOURCE (S) 

Valid sources are the pixel source, constants, and registers. Valid destinations 
are registers, RC, HC 5 and IC. Output data is taken from the register file 308. It should 
5 be noted that vertex programs use the functional module 302 for output. A particular 
API mode allows selection of an output format for the color and depth values, and 
whether the program will generate a new depth value. 

A blend function and alpha testing may or may not be available based on the 
1 0 color output fonnat. For example, a blend function and alpha testing may be available 
if the selected color format is four (4) unsigned bytes. The final color is taken from 
register R0, HO, or 10. The final color vector, regardless of the precision format, may be 
stored into a frame buffer assuming a similarly sized color buffer. 

15 If a depth value is to be generated, the final value of Rl .x, HI .x, or II .x holds 

the new depth value. If depth is not to be generated, the standard pipeline depth is used. 
Depth is normalized to a (0,1) range which is clamped and scaled by hardware to fit the 
final depth buffer test fonnat. The depth writemask may apply. 

20 As mentioned earlier, three formats are supported for vector components. More 

information regarding precision will now be set forth in the context of an exemplary 
embodiment. Table 6 illustrates each of the various formats. 

Table 6 

25 

Floatingpoint: fp32 (s.e8.m23) 
Floatingpoint: fpl6 (s.e5.ml0) 

Signed fixed point: s!2 (2.10 in 2's complement, 
range of -2 to +2047/1024) , 

30 

where : 

fp32 refers to a 32 -bit floating point precision 
fpl6 refers to a 16 -bit floating point precision 
35 sl2 refers to fixed point precision 
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It may not necessarily be possible to mix formats inside a vector. Further, in 
one embodiment, no floating point exceptions or interrupts may be supported. Denorms 
may be flushed to zero, and NaN may be treated as infinity. Negative 0.0 may also be 
treated as positive 0.0 in comparisons. 

5 

In 32-bit floating point mode, the RCP and RSQ instructions may deliver 
mantissa results accurate to 1 .0/(2**22). Moreover, the approximate output (.z) in the 
EXP and LOG instructions only have to be accurate to 1 .0/(2**1 1). The LIT instruction 
output (.z) allows error equivalent to the combination of the EXP and LOG 
10 combination implementing a power function. 

In 16-bit floating point mode, the RCP, RSQ, LOG, and EXP instructions 
deliver results accurate to within one least significant bit of the correct answer. LIT has 
at least the accuracy of a LOG, multiply, and EXP sequence in 1 6-bit floating point 
15 mode. In fixed point mode, all calculations are performed and then clamped into the 
valid range. 

Since distance is calculated as (d*d)*(l/sqrt(d*d)), 0.0 multiplied by infinity 
may be 0.0. Since if/then/else evaluation is done by multiplying by 1 .0/0.0 and adding 
20 the values set forth in Table 7. 



Table 7 



0.0 * x = 0.0 for all x (including infinity and NaN) 
25 1.0 * x = x for all x (including infinity and NaN) 

0.0 + x = x for all x (including infinity and NaN) 

In one embodiment, the registers may be grouped into 128-bit chunks, each of 
which maybe used as a single 4*fp32 quad-float, two 4*fpl6 quad-floats, or two 4*sl2 
30 quad-fixed point. There are eight (8) such chunks allowing a maximum of eight (8) 

registers in fp32 mode and sixteen (16) registers in fpl6. It should be noted that there 
are only eight (8) si 2 registers. 
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The present invention is allowed to use mixed precision registers as sources and 
destination to an instruction. In this case, conversion to destination precision is done 
before the instruction is executed. The instruction itself is performed at the destination 
precision. 

5 

If a 128-bit chunk is read in a different format from which it was last written, 
0.0 is returned. Pixel source and constants may be in 32-bit floating point precision, but 
may be reduced to lower precision by the destination. 

10 More information will now be set forth regarding each of the programmable 

operations 600 of Figure 6. 

No Operation (NOP) 

15 Format: 

NOP 
Description: 

20 

No Operation. 
Examples: 
25 NOP 

Texture Fetch (TEX,TXP,TXD) 

Format: 

30 

TEX [c] D[.xyzw][(RC[.xyzw])],[-]S0[.xyzw],#tid 



QMcrwtirv ^wr\ 



WO 02/103633 



17 



PCT/US02/19504 



TXP [c] D[.xyzw][(RC[.xyzw])] s [-]SO[.xyzw],#tid 
TXD [c] D[.xyzw][(RC[.xyzw]]],[.]S0[.xyzw],[-]Sl[.xyzw] 9 
[-]S2[.xyzwJ,#tid 

5 Description: 

The contents of the source vector are used as a texture coordinate indexing into 
the specified (via tid:0-15) texture map. The filtered vector resulting is placed into the 
destination as a quad-float. TEX generates a texture fetch of (x,y,z) while TXP 
1 0 generates a texture fetch of (x/w,y/w,z/w). TXD allows specification of the derivative in 
x (SI) and y (S2). These maybe used for LOD/anisotropic calculations. TXD generates 
a texture fetch of (x,y,z). 



15 



20 



Operation: 

Table 8 sets forth an example of operation associated with the TEX, TXP, and 
TXD instructions. 

Table 8 



/* c is x or y or z or w 



t.x = sourceO.c*** 
t.y = sourceO.*c** 
t.z = sourceO.**c* 
t.w ss source0.***c 
25 if (-sourceO) 

t = -t; 

q = TextureFetch (t, texid) ; 

30 if (destination. x) R.x = q.x; 

if (destination, y) R.y = q.y; 

if (destinations) R.z = q.z; 

if (destinations) R.w = q.w; 

35 Examples: 

TEX R2,R3 5 3 //Fetch from texture 3 using R3 as coords. 
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Derivative X (DDX) 



Format: 



5 DDX[c] D[.xyzw][(RC[.xyzw])] ? HSO[.xyzw] 

Description: 

DDX operates to ensure that the rate of change of the components of the source 
1 0 with respect to the horizontal axis 'X 5 are placed into the destination. 

Operation: 

Table 9 sets forth an example of operation associated with the DDX instruction. 



15 



Table 9 



t.x = sourceO.c***; /* c is x or y or z or w */ 
t.y = sourceO . *c** ; 



20 t.z = source0.**c* 

t . w = sourceO . ***c; 
if {-sourceO) 
t = -t ; 

25 q.x = d(t.x)/dx 

q.y = d(t .y) /dx 
q. z = d (t . z) /dx 
q. w = d ( t . w) /dx 

30 if (destination. x) R.x = q.x; 

if (destination. y) R.y = q.y; 

if (destination. z) R.z = q.Z; 

if (destination. w) R. w = q.W; 



35 Examples: 

DDX R2 ? R1 //Fetch x derivatives of Rl 



Derivative Y (DDY) 
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Format: 



DDY[c] D[.xyzw][(RC[.xyzw])],[-]SO[.xyzw] 



10 



15 



Description: 

DDY operates to ensure that the rate of change of the components of the source 
with respect to the vertical axis * Y' is placed into the destination. 

Operation: 

Table 10 sets forth an example of operation associated with the DDY 
instruction. 

Table 10 



20 



25 



30 



t.x = sourceO.c*** 
t.y = sourceO . *c** 
t.z = sourceO . **c* 
t.w - sourceO. ***c 
if (-sourceO) 
t = -t; 



/* c is x or y or z or w */ 



q.x 

q.y 

q.z 
q . w 



d(t.x) /dy 
d(t.y)/dy 
d(t.z) /dy 
d{t.w) /dy 



if (destination. x) R.x = 

if (destination. y) R.y = 

if (destinations) R.z = 

if (destination. w) R.w = 



q.x; 

q.y; 

q.z; 
q.w; 



35 Examples: 



DDY R2,R0 



//Fetch y derivatives of R0 



Move (MOV) 
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Format: 



MOV[c] D[.xyzw][(RC[.xyzw])],[-]SO[.xyzw] 



Description: 

MOV operates to move the contents of the source into a destination. 
10 Operation: 

Table 1 1 sets forth an example of operation associated with the MOV 
instruction. 



15 



Table 11 



20 



25 



30 



t.x = sourceO. c*** 
t.y = sourceO . *c** 
t.z = sourceO. **c* 
t.w = source0.***c 
if (-sourceO) 
t = -t; 



/* c is x or y or s or w */ 



q.x 

q-y 

q* z 
q. w 



t .x; 

t-y; 

t . Z; 
t . w; 



if (destination. x) R.x 

if (destination.y) R.y 

if (destination. z) R.z 

if (destinations) R.w 



q.x; 

q-y; 

q.z; 
q.w; 



Examples: 



35 



MOV RQ-R3 //Compare negative R3 to 0.0 and save 

MOV R2,p[POS].w //Move w component of v[POS] into xyzw components 

ofR2 
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MOV Rl .xyw ? R2.x //Move x component of R2 into x 3 y 3 w components of 
Rl 



Multiply (MUL) 

5 

Format: 

MUL[c] D[.xyzw][(RC[.xyzw])],[-]SO[.xyzw],HSl[.xyzw] 
10 Description: 

MUL operates to multiply sources into a destination. It should be noted that 0.0 
times anything is 0.0. 

15 Operation: 

Table 12 sets forth an example of operation associated with the MUL 
instruction. 

20 Table 12 



t.x = sourceO.c***; /* c is x or y or z or w */ 
t.y = sourceO . *c** ; 
t.z = sourceO . **c* ; 
25 t.w = sourceO . ***c ; 

if (-sourceO) 
t = -t; 



30 u.y « sourcel 

u.z = sourcel. **c*; 
u.w = sourcel . ***c ; 



u.x = sourcel . c***; /* c is x or y or s or w */ 



35 



40 



if (-sourcel) 
u = -u; 

q.x e t.x*u.x; 
q.y = t.y*u.y ; 

q.z = t.Z*U.S; 

q . w = t.w*u.w; 

if (destination. x) R.x = q.x; 
if (destination. y) R.y = q*y; 
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if (destination. z) R.z = q.z; 
if (destination. w) R.w = q.w; 

Examples: 

MUL H6,H5,c[CON5] //H6.xyzw = H5.xyzw * c[CON5].xyzw 
MUL H6.x,H5.w,-H7 //H6.x = H5.w*-H7.x 

Add (ADD) 

Format: 

ADD[c] D[.xyzw][(RC[.xyzw])],[-]SO[.xyzw],HSl[.xyzw] 
Description: 

ADD serves to add sources into a destination. 
Operation: 

Table 13 sets forth an example of operation associated with the ADD 
instruction. 



Table 13 



t.x = sourceO.c*** 
t.y = sourceo . *c** 
t.z = source0.**c* 
t.w = source0.***c 
if (-sourceO) 
t - -t; 



/* c is x or y or s or w */ 



u.x = sourcel.c***; 
u.y = sourcel . *c**; 
u.z s= sourcel . **c* ; 
u.w = sourcel . ***c; 
if (-sourcel) 
u = -u; 



/* c is x or y or z or w */ 



g.x ~ t.x+u.x; 



02103633A1 I > 



15 



WO 02/103633 PCT/US02/19504 

23 



q.y = t.y+u.y; 
q.z = t.z+U.Z; 
q . w = t . w+u . w ; 



if (destination. x) R.x = q.x; 

if (destination. y) R.y = q.y; 

if (destination, z) R.z = q.z; 

if (destinations) R.w = q.w; 



10 Examples: 



ADD HC.x,H5.x 5 c[CON5] //Compare H5.x+c[CON5].x to 0.0 and set RCx 
ADD H6.x,H5,-H7 //H6.x = H5.x - H7.x 

ADD H6,-H5,c[CON5] //H6.xyzw = -H5.xyzw + c[CON5].xyzw 
Multiply And Add (MAD) 



Format: 

20 MAD[c] D[.xyzw][(RC[.xyzw])],[-]SO[.xyzw],[-]Sl[.xyzw], 

[-]S2[.xyzw] 

Description: 

25 MAD serves to multiply and add sources into a destination. It should be noted 

that 0.0 times anything is 0.0. 

Operation: 

30 Table 14 sets forth an example of operation associated with the MAD 

instruction. 

Table 14 

35 t.x = sourceO . c*** ; /* c is x or y or z or w */ 

t.y = source0.*c**j 
t.z = sourceO . **c* : 
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t.w = sourceO . ***c; 
if (-sourceO) 
t = -t; 

u.x = sourcel.c***/ /* c is x or y or z or w */ 
u.y = Bourcel . *c**; 



u.z = sourcel . **c* ; 
u.w = sourcel. ***c; 
if (-sourcel) 
10 u = -u; 

v.x = source2.c***; /* c is x or y or z or w */ 
v.y = source2.*c**/ 
v.z = source2.**c*; 
15 v . w = source2 . ***c; 

if (-source2) 
v = -V; 

g.x = t .x*u .x+v.x; 

20 q.y = t .y*u .y+v.y; 

q.z bs t.z*u.z+v.z; 

q.w = t.w*u.w+v.w; 

if (destination. x) R.x - q.x; 
25 if (destination.y) R.y = q.y; 

if (destinations) R.z = q.z; 
if (destinations) R.w = q.w; 

Examples: 

30 

MAD H6,-H5,p[POS],-H3 //H6 = -H5 * p[POS] - H3 

MAD H6.z,H5.w,p[POS],H5 //H6.z - H5.w * p[POS].z + H5.z 



Reciprocal (RCP) 

35 

Format: 

RCP[c] D[.xyzw][(RC[.xyzw])],[-]SO.[xyzw] 
40 Description: 

RCP inverts source scalar into a destination. The source may have one subscript. 
Output may be exactly 1 .0 if the input is exactly 1 .0. 

45 RCP(-Inf) gives (-0.0,-0.0,-0.0,-0.0) 
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RCP(-O.O) gives (-Inf,-tof,-Inf 5 -Inf) 
RCP(+0.0) gives (+Inf,+Inf 3 +Inf,+Inf) 
RCP(+Inf) gives (0.0,0.0,0.0,0.0) 

5 Operation: 

Table 15 sets forth an example of operation associated with the RCP instruction. 



10 



30 



35 



Table 15 



t.x « sourceO.c***; /* c is x or y or z or w */ 
t.y = sourceO . *c** ; 



t.z = sourceO. **c* 
t.w = sourceO. ***c 
15 if (-sourceO) 

t = -t; 

if (t.x == i.o) 
0 _ q-x « q.y = q.z * q.w = 1.0; 

20 else 



q.x = q.y = q.z = q.w = 1.0/t.x; where |q.x 

IEEE(1.0/t.x) | < 1/(2**22) for all 1.0<=t.x<2.0 



if (destination, x) R.x = q.x; 

25 if (destination. y) R.y = q.y; 

if (destination. z) R.z = q.z; 

if (destination. w) R.w = q.w; 



Examples: 

RCP R2,c[14].x //R2.xyzw=l/c[14].x 
RCP R2.w,R3.z //R2.w= 1/R3.2 

Reciprocal Square Root (RSQ) 

Format: 



RSQ[c] D[.xyzw][(RC[.xyzw])],[-]S0.[xyzw] 
40 Description: 
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RSQ performs an inverse square root of absolute value of source scalar into a 
destination. The source may have one subscript. Output may be exactly 1.0 if the input 
is exactly 1.0. 

5 

RSQ(O.O) gives (4-Inf,+Inf 5 +Inf 3 +Inf) 
RSQ(Inf) gives (0.0,0.0,0.0,0.0) 



Operation: 

10 

Table 16 sets forth an example of operation associated with the RSQ instruction. 

Table 16 



15 



20 



t.x = sourceO.c*** 
t.y = source0.*c** 
t.z = sourceO . **c* 
t.w = sourceO. ***c 
if (-sourceO) 
t = -t; 



/* c is x or y or z or w */ 



if (t.x == l.o) 

q.x = q.y = q.z = q.w = 1.0; 
else 

25 q.x=q.y=q.z=q.w=l.o/sqrt(abs(t.x)) ; with |q.x - 

IEBE(1.0/sqrt (t .x) ) j < 1/(2**22) for 1.0<=t.x<4.0 

if (destination, x) R.x = q.x; 

if (destination, y) R.y = q.y; 

30 if (destinations) R.z s= q.z; 

if (destination, w) R.w = q.w; 

Examples: 



35 RSQ R3,R3.y //R3 = l/sqrt(abs(R3.y)) 

RSQ R2.w,p[9].x //R2.w=l/sqrt(abs(p[9].x)) 



Three Component Dot Product (DP3) 



40 Format: 



DMenrv«ir>. ~\Air\ 
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5 



10 



25 



30 



35 



40 



DP3[c] D[.xyzw][(RC[.xyzw])],HSO[.xyzw],[-]Sl[.xyzw] 
Description: 

DP3 perfonns a three component dot product of the sources into a destination. It 
should be noted that 0.0 times anything is 0.0. 

Operation: 

Table 17 sets forth an example of operation associated with the DP3 instruction. 

Table 17 



15 t.x = sourceO.c*** 

t.y =s source0.*c** 
t.z = sourceO.**c* 
t.w = sourceO.***c 
if (-sourceO) 

20 t - -t; 

u.x = sourcel.c*** 
u.y = sourcel . *c** 
u.is = sourcel.**c* 
u.w = sourcel . ***c 
if (-sourcel) 
u = -u; 



/* c is x or y or z or w */ 



/* c is x or y or z or w */ 



q.x = q.y = q.z = q.w = t.x*u.x + t.y*u.y + t .z*u.z ; 

if (destination. x) R.x = q.x; 
if (destination. y) R.y = q.y; 
if (destinations) R.z = q.z; 
if (destination, w) R.w = q.w; 

Examples: 

DP3 H6,H3,H4 //H6.xyzw = H3.x*H4.x + H3.y*H4.y + H3.z*H4.z 
DPS H6.w,H3 ? H4 //H6.w = H3.x*H4.x + H3.y*H4.y + H3.z*H4.z 

Four Component Dot Product (DP4) 
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Format: 



DP4[c] D[.xyzw][(RC[.xyzw])],[-]SO[.xyzw],[-]Sl [.xyzw] 
5 Description: 

DP4 performs a four component dot product of the sources into a destination. It 
should be noted that 0.0 times anything is 0.0. 

10 Operation: 

Table 18 sets forth an example of operation associated with the DP4 instruction. 

Table 18 



/* c is x or y or z or w */ 



/* c is x or y or z or w */ 



15 

t.x = sourceO.c*** 
t.y = source0.*c** 
t.z = sourceO . **c* 
t.w = source0.***c 
20 if (-sourceO) 

t = -t; 

u.x s= sourcel.c*** 
u.y = sourcel.*c** 
25 u.z = sourcel.**c* 

u . w = sourcel.***c 
if (-sourcel) 
u = -u ; 

30 q.x = q.y = q.z = q.w = t.x*u.x + t.y*u.y + t.z*u.z 

+ t . w*u . W; 

if (destination. x) R.x = q.x; 
if (destination. y) R.y = q.y; 
35 if (destination. z) R.z = q.z; 

if (destination. w) R.w m q.w; 

Examples: 

40 DP4 H6,p[POS],c[MV0] //H6.xyzw = p.x*c.x + p .y* c .y + p.z*c.z + p.w*c.w 

DP4 H6.xw,p[POS].w,H3 //H6.xw = p.w*H3.x + p.w*H3.y + p.w*H3.z + 
p.w*H3.w 
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Distance Vector (DST) 



Format: 



DST[c] D[.xyzw][(RC[.xyzw])],[-]SO[.xyzw],[-]S 1 [.xyzw] 
Description: 

10 DST calculates a distance vector. A first source vector is assumed to be 

(NA,d*d 5 d*d,NA) and second source vector is assumed to be (NA,l/d,NA,l/d). A 
destination vector is then (l 5 d,d*d ? l/d). It should be noted that 0.0 times anything is 0.0. 



15 



Operation: 



Table 19 sets forth an example of operation associated with the DST instruction. 



Table 19 



20 



25 



30 



35 



40 



t.x = sourceO. c*** 
t.y = source0.*c** 
t.z = sourceO.**c* 
t . w s= sourceO . * * * c 
if (-sourceO) 
t = -t; 

u.x = sourcel.c*** 
u.y = sourcel.*c** 
u.z = sourcel.**c* 
u.w - sourcel.***c 
if (-sourcel) 
u = -u; 



/*cisxory or z or w */ 



q.x 

q-y 

q.s 
q. w 



1.0; 

t .y*u .y ; 
t.z; 
u. w; 



/* c is x or y or z or w */ 



if (destination. x) R.x 

if (destination. y) R.y 

if (destination. z) R.z 

if (destinations) R - w 



q.x; 

q-y; 

q.z; 
q.w; 
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Examples: 



DST R2,R3,H4 //R2.xyzw = (1 .0 s R3.y*H4.y,R3.z ? H4.w) 



Minimum (MIN) 



Format: 



MIN[c] D[.xyzw][(RC[.xyzw])],[-]SO[.xyzw],[-]Sl[.xyzw] 



10 



Description: 



MDSf serves to move a minimum of sources into a destination. 



15 Operation: 



Table 20 sets forth an example of operation associated with the MIN instruction. 



20 



25 



Table 20 



t.x = sourceO.c***; 
t.y = source 0 . *c**; 
t.z = sourceO . **c*; 
t.w = sourceO . ***c; 
if (-sourceO) 
t = -t; 



/* c is x or y or z or w */ 



30 



35 



40 



u.x = sourcel.c*** 
u.y = sourcel. *c** 
u.z = sourcel. **c* 
u.w = sourcel. ***c 
if (-sourcel) 
u = -u; 



q.x = (t.x < u.x) ? t.x 

q.y = (t.y < u.y) ? t.y 

q.z = (t.z < u.z) ? t.z 

q.w = (t.w < u.w) ? t.w 

if (destination. x) R.x = 

if (destination. y) R.y = 

if (destination. z) R.z = 

if (destinations) R.w = 



/* c is x or y or z or w */ 



u.x; 
u.y; 
u.z; 
U.w; 



q.x 

q.y 

q.z 
q.w 
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Examples: 



MIN R2/R3,H0 //R2 = component min(R3,H0) 
MIN R2.x ? R3.z,H0 //R2.x = min(R3.z,H0.x) 

MIN CH,R3 .z,H0 //Compare min(R3 .z ? H0.xyzw) to 0.0 and set RC 



Maximum (MAX) 



10 Format: 



MAX[c] D[.xyzw][(RC[.xyzw])] ? [-]S0[.xyzw],HSl [.xyzw] 



Description: 



15 



MAX moves a maximum of sources into a destination. 



Operation: 

20 Table 21 sets forth an example of operation associated with the MAX 

instruction. 

Table 21 



25 



30 



35 



t.x =: sourceO.c*** 
t.y = source0.*c** 
t.z = source0.**c* 
t.w = source0.***c 
if (-sourceO) 
t m -t; 

u.x 5= sourcel.c***; 
u.y = source! . *c** ; 
u.z - sourcel . **c*; 
u.w = sourcel.***c; 
if (-sourcel) 
U = -u; 



q.x 



(t.x >= u.x) 



/* c is xoryor z orw*/ 



/* c is x or y or z or w */ 



t.x 



U.X; 
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q.y = (t.y >= u.y) ? t.y : u.y; 

q.z = (t.z >= u.z) ? t.z : U.z; 

q.w b (t.w >= U.w) ? t.w ; u.w; 

if (destination. x) R.x = q.x; 

if (destination. y) R.y = q.y; 

if (destination. z) R.z - q.s; 

if (destination. w) R.w = q.w; 



10 Examples: 



MAX R2,R3,H0 //R2 = component max(R3,H0) 
MAX R2.w,R3.x,H0 //R2.w = max(R3.x,H0.w) 



15 Pack2 (PK2) 



Format: 



PK2[c] D[.xyzw][(RC[.xyzw])],HSO[.xyzw] 

20 

Description: 

PK2 packs two source components (.xy after swizzle) into a destination. The 
destination may be a fp32 "R" register. The source components are converted into fpl6 
25 format and packed into a destination. 

Operation: 

Table 22 sets forth an example of operation associated with the PK2 instruction. 



30 



Table 22 



t.x = sourceO.c***; /* c is x or y or z or w */ 
t.y = sourceO . *c** ; 
35 t.z = sourceO . **c* ; 

t.w = sourceO . ***c; 
if (-sourceO) 
t = -t; 
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t.x = fpl6 (t .X) ; 
t.y = fpl6 (t.y) ; 

q.x = q.y = q.z = q.w = ((t.x) | (t.y<<16)) ; /* raw 

bit packing */ 



if (destination. x) R.x = q.x; 

if (destination. y) R.y = q.y; 

if (destinations) R.s = q.s; 

10 if (destinations) R.w = q.w; 



15 



Examples: 

PK2 R0.z,R3 // pack x,y components of R3 into RO.z 

Pack4 (PK4) 
Format: 

20 PK4[c] D[.xyzw][(RC[.xyzw])],[.]SO[.xyzw] 

Description: 

PK4 packs four source components into a destination. The destination may be a 
25 fp32 "R" register. The source components are clamped to the range (-1.008,1.0) before 
being packed into a destination as unsigned 8bit bytes. 

Operation: 

30 Table 23 sets forth an example of operation associated with the PK4 instruction. 

Table 23 



t.x = sourceO.c***; /* c is x or y or s or w */ 
35 t.y = sourceO . *c**; 

t.z = sourceO . **c* ; 
t.w = sourceO . ***c; 
if (-sourceO) 
t = -t; 

40 
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30 



q.x s t.x; if (q.x > 1.0) q.x = 1.0; else if (q.x < - 

1.008) q.x = -1.008; 

q.y = t.y; if (q.y > 1.0) q.y = 1.0; else if (q.y 

1.008) q.y = -1.008; 

q.z = t.z; if (q.z > 1.0) q.z = 1..0; else if (q.s < - 

1.008) q.z = -1.008; 

q.w = t.w; if (q.w > 1.0) q.w = 1.0; else if (q.w < - 

1 . 008) q.w = -1 . 008 ; 



10 ub.x = 127.0*q.x + 128 

ub.y = 127.0*q.y + 128 

ub.z = 127.0*q.z + 128 

ub.w = 127.0*q.w + 128 



/* ub is unsigned byte vector */ 



15 q.x = q.y = q.z = q.w = ((ub.x) | (ub.y«8) | (ub.z<<16) | 

(ub.w<<24)); /* raw bit packing */ 

if (destinations) R.x = q.x; 

if (destination. y) R.y = q.y; 

20 if (destinations) R.s = q.s; 

if (destinations) R.w = q.w; 

Examples: 

25 PK4 R0.z,R3 // pack 4 components of R3 into RO.z 

Unpack2 (UP2) 



Format: 

UP2[c] D[.xyzw][(RC[.xyzw])],[-]S0.[xyzw] 
Description: 

35 UP2 unpacks source component into a destination. The source may be a fp32 

"R" register scalar. The source component is assumed to be a packed fpl6 pair. 

Operation: 

40 Table 24 sets forth an example of operation associated with the UP2 instruction. 

Table 24 



02103633A1 I > 
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5 
10 
15 

Examples: 

20 

UP2 
Unpack4 (UP4) 

25 Format: 

UP4[c] D[.xyzw][(RC[.xyzw])],[-]SO.[xyzw] 
Description: 

30 

UP4 unpacks source component into a destination. The source may be a fp32 
"R" register scalar. The source component is assumed to be a packed unsigned 8-bit 
quartet and all are biased and scaled back into the range (-1.008,1.0) before assignment 
to destination. 

35 

Operation: 

Table 25 sets forth an example of operation associated with the UP4 instruction. 



t.x = sourceO.c***; /* c is x or y or z or w */ 
t.y = sourceO . *c**; 
t.s = source0.**c* 
t.w = source0.***c 
if (-sourceO) 
t = -t; 

q.x = q.z = (t.x>> 0) & Oxffff; /* use raw bits of t.x 
*/ 

q.y = q.w « (t.x»16) & Oxffff; /* use raw bits of t.x 
*/ 

if (destination. x) R.x = q.x; 
if (destination. y) R.y = q.y; 
if (destinations) R.z = q.z; 
if (destination. w) R.w = q.w; 



R0.xy 5 R3.y // unpack two components of R3.y into RO.xy 
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Table 25 



10 



15 



20 



t.x = sourceO.c*** 
t.y = sourceO . *c** 
t.z = sourceO. **c* 
t.w = sourceO,***c 
if (-sourceO) 
t = -t; 



/* c is x or y or z or w */ 



q.x = 
q.y = 
q.z = 
q.w = 

q.x = 
q.y = 
q . 2 = 
q.w = 



(t.x>> 0) & Oxff 

(t.x>> 8) & Oxff 

(t.x>>16) & Oxff 

(t.x>>24) & Oxff 



(q.x 

(g-y 

(q. s 
(q.w 



128)/127.0 
128) /127.0 
128)/127.0 
128)/l27.0 



if (destination. x) R.x 

if (destination. y) R.y 

if (destination. z) R.z 

if (destination. w) R.w 



/* use raw bits of t.x */ 

/* use raw bits of t.x */ 

/* use raw bits of t.x */ 

/* use raw bits of t.x */ 



q.x; 
q-y; 

q.z; 
q.w; 



25 Examples: 



UP4 R0,R3.x 



// unpack four components of R3.x into RO.xyzw 



Set On Less Than (SLT) 



30 



Format: 



SLT[c] D[.xyzw][(RC[.xyzw])] 9 [.]S0[.xyzw] 9 [.]Sl[.xyzw] 



35 Description : 



SLT sets the destination to 1.0/0.0 if sourceO is less_ than/greater_or__equal to 
source 1 . The following relationships should be noted: 



40 SetEQ R0,R1 = (SGE R0,R1) * (SGE -R0,-R1) 

SetNE R0,R1 = (SLT R0,R1) + (SLT -R0,-R1) 
SetLE R0,R1 = SGE -R0,-R1 



nwcnrv-iirv ,wn 



•i 
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SetGT R0,R1 = SLT -R0 5 -R1 



Operation: 



Table 26 sets forth an example of operation associated with the SLT instruction. 



Table 26 



10 



15 



20 



25 



30 



t.x = sourceO . c*** 
t.y = source0.*c** 
t.z = sourceO.**c* 
t.w s= source0.***c 
if (-sourceO) 
t « -t; 



/* c is x or y or z or w */ 



**c* 
*** c 



u .x ra sourcel . 
u.y = sourcel. 
u.z = sourcel. 
u.w = sourcel. 
if (-sourcel) 
u - -u ; 



q.x = (t .x < u.x) ? 1.0 

q.y = (t .y < u.y) ? 1. 0 ; 

q.z= (t.z < u.z) ? 1 . 0 : 

q . w = (t.w < u.w) ? 1.0 

if (destination. x) R.x = 

if (destination.y) R.y = 

if (destination. z) R . z = 

if (destinations) R . w = 



/* c is x or y or z or w */ 



0.0 
0.0 
0.0 
0.0; 

q.x; 
q-y; 
q.z; 
q.w; 



Examples: 



35 



SLT H4,H3,H7 //H4.xyzw = (H3.xyzw < H7.xyzw ? 1 .0 : 0.0) 
SLT H3.xz,H6.w,H4 //H3.xz = (H6.w < H4.xyzw ? 1 .0 : 0.0) 



Set On Greater Or Equal Than (SGE) 



40 Format: 



SGEfc] D[.xyzw][(RC[.xyzw])],[-]S0[.xyzw],[-]Sl [.xyzw] 
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SGE sets the destination to 1.0/0.0 if sourceO is greater_or__equal/less_than 



source 1. 



Operation: 



Table 27 sets forth an example of operation associated with the SGE instruction. 



10 



Table 27 



15 



20 



25 



30 



35 



Examples: 



40 



t.x = sourceO . c*** ; 
t.y s= sourceO . *c** ; 
t.z = sourceO . **c* ; 
t.w = sourceO . ***c; 
if (-sourceO) 
t = -t; 

u . x = sourcel.c*** 
u.y = sourcel.*c** 
u.z = sourcel.**c* 
u.w = sourcel.***c 
if (-sourcel) 
u = -u; 



/* c is xoryor z or w */ 



/* c is x or y or z or w */ 



q.x 


= (t.x >= u.x) 


7 


1. 


0 




0 . 


0/ 


q-y 


= (t.y >= u.y) 


•? 


1 . 


0 




0 . 


0; 


q.z 


= (t.z >= u.z) 


■p 


1. 


0 




0 . 


0; 


q . w 


= (t.w >- u.w) 




1. 


0 




0. 


0; 


if 


(destination.x) 


R 


.X 




q- 


X; 




if 


(destination. y) 


R 


• y 




q- 


y; 




if 


(destination . z) 


R 


• z 




q- 


z; 




if 


(destination . w) 


R 


. w 


s= 


q- 


W; 





SGE H4,H3,H7 //H4.xyzw = (H3.xyzw >= H7.xyzw ? 1.0 : 0.0) 
SGE H3.xz,H6.w,H4 //H3.xz = (H6.w >= H4.xyzw ? 1 .0 : 0.0) 



Floor (FLR) 



Format: 
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FLR[c] D[.xyzw][(RC[.xyzw])],[-]SO[.xyzw] 



Description: 



FLR set the destination to floor of source. 



Operation: 



10 



Table 28 sets forth an example of operation associated with the FLR instruction. 



15 



20 



25 



30 



t.x = sourceO.c*** 
t.y = source0.*c** 
t.z = source0.**c* 
t.w = source0.***c 
if (-sourceO) 
t = -t; 



q.x 

q-y 

q. 2 
q. w 



f loor (t .x) ; 
f loor (t .y) ; 
floor (t . z) ; 
floor (t.w) ; 



Table 28 



/*cisxoryorz or w */ 



if (destination. x) R.x = 

if ( destination. y) R.y = 

if (destination. z) R.z = 

if (destination. w) R.w * 



q.x; 

q-y; 

q.z; 
q.W; 



Examples: 



FLR H4.z,R3 



//H4.z = floor(R3.z) 



35 Fraction (FRC) 



Format: 



FRC[c] D[.xyzw][(RC[.xyzw])],[-]S0[.xyzw] 
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Description: 

FRC sets a destination to a fractional part of a source. The fraction is 0.0 <= 
5 fraction < 1 .0. 

Operation: 

Table 29 sets forth an example of operation associated with the FRC instruction. 



10 



Table 29 



t.x = sourceO.c***; /* c is x or y or z or w */ 
t.y = source0.*c**j 



15 t.s = sourceO.**c* 

t . w = sourceO.***c 
if (- source 0) 
t = -t; 

20 q.x = t.x - floor(t.x); 

q.y = t.y - floor(t.y); 
q.z = t.z - floor(t.z); 
q.w = t.w - floor(t.w); 

25 if (destination.x) R.x = q.x; 

if (destination. y) R.y = q.y; 

if (destination, z) R.z = q.z; 

if (destinations) R.w - q . w; 



30 Examples: 

FRC H4.z,R3 //H4.z = R3.z-floor(R3.z) 

Kill Pixel (KIL) 

35 

Format: 



KIL RC[.xyzw] 
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Description: 

KIL kills the pixel based on any of the RC bits (post swizzle) being TRUE. T OT. 
cannot set the condition codes. 



Operation: 

Table 30 sets forth an example of operation associated with the KIL instruction. 
10 Table 30 



b.x = RC.c***; /* c is x or y or z or w */ 
b.y = RC. *c** 



b.z = RC.**c* 
15 b.w = RC. *** c 



if (b.x J b.y | b.z ) b.w) 
Kill pixel; 



20 Examples: 



KIL EQ //Kill pixel if RC x or y or z or w are = 0.0 

KIL LT.x //Kali pixel if RC x bit < 0.0 

KIL NE.xxzz //Kill pixel if x or z RC bits != 0.0 



Exponential Base 2 (EXP) 

Format: 

30 EXP[c] D[.xyzw][(RC[.xyzw])],[.]S0.[xyzw] 

Description: 

EXP generates an approximate answer in dest.z and allows for a more accurate 
35 answer of dest.x*FUNC(desty) where FUNC is some user approximation to 2**dest.y 
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(0.0 <= dest.y < 1.0). EXP accepts a scalar sourceO. Reduced precision arithmetic is 
acceptable in evaluating dest.z. 

EXP(-Inf) or underflow gives (0.0,0.0,0.0,1.0) 
EXP(-Mnf) or overflow gives (+Inf,0.0,-Mnf,1.0) 

Operation: 

Table 31 sets forth an example of operation associated with the EXP instruction. 

Table 31 

t.x = sourceO.c***; /*cisxory or z or w */ 
t.y = sourceO . *c** ; 
15 t.z = source0.**c*; 

t.w = source0.***c; 
if (-sourceO) 
t = -t; 

20 q.x = 2**TruncateTo-Inf inity (t .x) ; 

q.y = t.x - TruncateTo-Inf inity (t .x) ; 
q.z = q.x * APPX(q.y); where 
|exp (q.y*LN2) -APPX(q.y) j < 1/(2**11) for all 0<=q.y<1.0 
q . w = 1.0; 



10 



25 



30 



if (destination .x) R.x = q.x; 

if (destination. y) R.y = q.y; 

if (destinations) R.z = q.z; 

if (destination. w) R.w - q.w; 



Examples: 

EXP H4,R3.z 
35 Logarithm Base 2 (LOG) 

Format: 

LOG[c] D[.xyzw][(RC[.xy2w])],[-]S0.[xyzw] 

40 
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Description: 

LOG generates an approximate answer in destz and allows for a more accurate 
answer of dest.x+FUNC(dest.y) where FUNC is some user approximation of 
5 log2(dest.y) (1 .0 <= dest.y < 2.0). LOG accepts a scalar sourceO of which the sign bit is 
ignored. LOG provides reduced precision arithmetic is acceptable in evaluating destz. 

LOG(O.O) gives (-Iaf ? 1.0 5 -liif 9 1.0) 
LOG(Inf) gives (Inf s 1.0,Inf ? 1.0) 



10 



15 



35 



40 



Operation : 

Table 32 sets forth an example of operation associated with the LOG 
instruction. 

Table 32 



t.x = sourceO. c*** ; /* c is x or y or z or w */ 
t .y = sourceO . *c** 
20 t.z = sourceO.**c* 

t.w = sourceO. ***c 
if (-sourceO) 
t = -t; 

25 if (abs(t.x) != 0.0) { 

q.x - exponent (t.x) (-128.0 <= e < 127) 

q.y = mantissa (t.x) (i.o m < 2.0) 

q.z = q.x + APPX(q.y) where 1 log (q.y) /LN2- 

OA APPX(q.y)| < 1/(2**11). for all 1 . 0<=q .y<2 . 0 

q.w = 1.0; 

else { 



q.x = -inf ; q.y = 1.0; q. z = -inf; q.w = 1.0; 

if (destination. x) R.x = q.x; 

if (destination. y) R.y = q.y; 

if (destinations) R.z = q.z; 

if (destination. w) R.w = q.w; 



Examples : 
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LOG H4 5 R3.z 



10 



Light Coefficients (LIT) 

5 Fonnat: 

LIT[c] D[.xyzw][(RC[.xyzw])] 5 [-]S0[.xyzw] 
Description : 

LIT provides lighting partial support. LIT calculates lighting coefficients from 
two dot products and a power (which gets clamped to — 128.0<power<128.0). Source 
vector is: 

1 5 SourceO.x = n*l (unit normal and light vectors) 

SourceO.y = n*h (unit normal and halfangle vectors) 
SourceO.z is unused 
SourceO.w = power 

20 Reduced precision arithmetic is acceptable in evaluating dest.z. Allowed error is 

equivalent to a power function combining the LOG and EXP instructions 
(EXP(w*LOG(y))). An implementation may support at least 8 fraction bits in the 
power. It should be noted that since 0.0 times anything maybe 0.0, taking any base to 
the power of 0.0 yields 1 .0. 

25 

Operation: 

Table 33 sets forth an example of operation associated with the LIT instruction. 
30 Table 33 
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10 



15 



20 



25 



t.x =s sourceO.c*** 
t.y = sourceO.*c** 
t . z — sourceO . **c* 
t.w = source0.***c 
if (-sourceO) 
t = -t; 



if (t.w < -127.9961) t.w = 

power is s8.8 */ 

else if (t.w > 127.9961) t.w = 

if (t.x < 0.0) t.x = 0.0; 

if (t.y < 0.0) t.y = 0.0; 



g.x = 1.0; 

/* ambient */ 

q.y = t.x; 

/* diffuse */ 

q.s = (t.x > 0, 

/* specular */ 

q . w = 1.0; 



/* c is x or y or 



or w 



-127.9961; /* assuming 
127 . 9961; 



0 ? EXP(t.w*LOG(t .y) ) : 0.0); 



if (destinations) R.x 

if (destination. y) R.y 

if (destination. z) R.z 

if (destination. w) R.w 



q.x; 

q.y; 
q-z; 
q.w; 



Examples : 



30 



35 



LIT R0,R3 



Appendix A sets forth a plurality of programming examples. 



APPENDIX A 



The #define statements are meant for a cpp run. 



40 ; Absolute Value H4 = abs (R0) 



MAX E4,R0,-R0; 



2) 



45 



Cross Product | i j k I into R2 
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|R0.x RO.y R0.z| 
|Rl-x Rl.y Rl.z| 

MUL) R2 ,R0 . zxyw,Rl .yzxw; 
5 MAD R2,R0.yzxw,Rl.zxyw, -R2 ; 

4) 

; reduce Rl to fundamental period 

10 

#define PERIOD 70; location PERIOD is 1 . 0/ (2*PI) , 2* PI , 0 . 0 , 0 . 0 

MUL R0 , Rl , c [PERIOD] .x; //divide by period 

FRC R2 , R0 ; 

15 MUL R2,R2, c [PERIOD] .y; //multiply by period 

5) 

; H4 = p->weight .x*H2 + (1 . 0 -p->weight .x) *H3 

20 

#define IWGT 8; source weight 

ADD H4 , H2 , -H3 ; / /LERP 

MAD H4,p[IWGT] .x,H4,H3; 

25 

6) 



/ R0 = (GT.X | | LT.y) ? Rl : R2 ; 

30 MOV R0,R2; 

MOV R0(GT.x),Rl; 
MOV R0(LT.y),Rl; 



7) 

35 

; RO.y = (EQ.xzw && LT.y) ? Rl . z : R2.W; 



MOV R0.y,Rl.Z; 
MOV RO.y (NE.XZWW) ,R2.W; 
40 MOV RO.y (GE.y) ,R2.W; 
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While various embodiments have been described above, it should be understood 
that they have been presented by way of example only, and not limitation. Thus, the 
breadth and scope of a preferred embodiment should not be limited by any of the above 
described exemplary embodiments, but should be defined only in accordance with the 
5 following claims and their equivalents. 
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CLAIMS 

What is claimed is: 



1 1 . A method for programmable pixel processing in a computer graphics 

2 pipeline, comprising: 

3 (a) receiving pixel data from a source buffer; 

4 (b) performing programmable operations on the pixel data in order to generate 

5 output, wherein the operations are programmable by a user utilizing 

6 instructions from a predetermined instruction set; and 

7 (c) storing the output in a register. 

1 2. The method as recited in claim 1 , wherein the output stored in the register is 

2 used in performing the programmable operations on the pixel data. 

1 3. The method as recited in claim 1 , wherein the pixel data includes a position, 

2 a pixel diffuse color, a specular color, a fog value, and a plurality of texture 

3 coordinates. 

1 4. The method as recited in claim 1, wherein the pixel data is selected from the 

2 group consisting of a position, a pixel diffuse color, a specular color, a fog 

3 value, and a plurality of texture coordinates. 

1 5. The method as recited in claim 1, and further comprising performing an 

2 operation involving the output, the operation selected from the group 

3 consisting of a scissor operation, a color format conversion, an alpha test 

4 operation, a z-buffer/stencil operation, a blend operation, a logic operation, a 

5 dither operation, and a writemask operation. 



03633 
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The method as recited in claim 1, wherein further standard operations are 
performed on the pixel data utilizing a standard graphics application program 
interface (API). 

The method as recited in 1 ? wherein the output includes a color value and a 
depth value . 

The method as recited in claim 1, and further comprising negating the pixel 
data prior to performing the programmable operations thereon. 

The method as recited in claim 1, and further comprising swizzling the pixel 
data prior to performing the programmable operations thereon. 

The method as recited in claim 1, wherein the programmable operations 
includes a texture fetch operation. 

The method as recited in claim 10, wherein the texture fetch operation 
involves a slope. 

The method as recited in claim 10, wherein the texture fetch operation is 
capable of being used in a level of detail (LOD) calculation. 

The method as recited in claim 1. wherein the programmable operations 
support multiple levels of precision. 

The method as recited in claim 13, wherein the levels of precision include 
full floating point, half floating point, and fixed point. 

The method as recited in claim 13, wherein the programmable operations are 
capable of converting the pixel data from a first level of precision to a second 
level of precision. 



* 
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1 1 6. The method as recited in claim 1, wherein the programmable operations are 

2 capable of clamping the pixel data for packing the pixel data into a 

3 destination. 

1 17. The method as recited in claim 1, wherein condition codes are initialized 

2 prior to the programmable operations being performed. 

1 18. The method as recited in claim 1 , wherein the programmable operations are 

2 capable of removing the pixel data. 

1 19. The method as recited in claim 1, wherein the programmable operations are 

2 selected from the group consisting of a no operation, texture fetch, move, 

3 derivative, multiply, addition, multiply and addition, reciprocal, reciprocal 

4 square root, three component dot product, four component dot product, 

5 distance vector, minimum, maximum, pack, unpack, set on less than, set on 

6 greater or equal than, floor, fraction, kill pixel, exponential base two (2), 

7 logarithm base two (2), and light coefficients. 

1 20. A computer program product for programmable pixel processing in a 

2 computer graphics pipeline, comprising: 

3 (a) computer code for receiving pixel data from a source buffer; 

4 (b) computer code for performing programmable operations on the pixel data in 

5 order to generate output, wherein the operations are programmable by a user 

6 utilizing instructions from a predetermined instruction set; and 

7 (c) computer code for storing the output in a register. 

1 21 . A system for programmable pixel processing, comprising: 

2 (a) a source buffer for storing pixel data; 

3 (b) a functional module coupled to the source buffer for performing 

4 programmable operations on the pixel data received therefrom in order to 
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5 generate output, wherein the operations are programmable by a user utilizing 

6 instructions from a predetermined instruction set; and 

7 (c) a register coupled to the functional module for storing the output. 

A method for programmable pixel processing in a computer graphics 
pipeline, comprising: 
receiving pixel data from a source buffer; 

performing programmable operations on the pixel data in order to generate 
output, wherein the operations are programmable by a user utilizing 
instructions from a predetermined instruction set, and the programmable 
operations support multiple levels of precision; and 

converting the pixel data from a first level of precision to a second level of 
precision. 



1 23 . A method for programmable pixel processing in a computer graphics 

2 pipeline, comprising: 

3 (a) receiving pixel data from a source buffer; 

4 (b) performing programmable operations on the pixel data including a texture 

5 fetch in order to generate output, wherein the operations are programmable 

6 by a user utilizing instructions from a predetermined instruction set; and 

7 (c) storing the output in a register. 

1 24. A method for programmable pixel processing in a computer graphics 

2 pipeline, comprising: 

3 (a) determining whether the graphics pipeline is operating in a programmable 

4 mode; 

5 (b) performing programmable operations on pixel data in order to generate 

6 output if it is determined that the graphics pipeline is operating in the 

7 programmable mode; and 

8 (c) performing standard operations on the pixel data in order to generate output 

9 in accordance with a standard graphics application program interface if it is 



1 22. 
2 

3 (a) 

4 (b) 
5 
6 
7 

8 (c) 
9 
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1 0 determined that the graphics pipeline is not operating in the programmable 

1 1 mode. 

1 25. The method as recited in claim 24, wherein the standard graphics application 

2 program interface includes OpenGL®. 

1 26. A computer program product for programmable pixel processing in a 

2 computer graphics pipeline, comprising: 

3 (a) computer code for determining whether the graphics pipeline is operating in 

4 a programmable mode; 

5 (b) computer code for performing programmable operations on pixel data in 

6 order to generate output if it is determined that the graphics pipeline is 

7 operating in the programmable mode; and 

8 (c) computer code for performing standard operations on the pixel data in order 

9 to generate output in accordance with a standard graphics application 

1 0 program interface if it is determined that the graphics pipeline is not 

1 1 operating in the programmable mode. 

1 27. The computer program product as recited in claim 26, wherein the standard 

2 graphics application program interface includes OpenGL®. 

1 28. A system for programmable pixel processing in a computer graphics 

2 pipeline, comprising: 

3 (a) means for determining whether the graphics pipeline is operating in a 

4 programmable mode; 

5 (b) means for performing programmable operations on pixel data in order to 

6 generate output if it is determined that the graphics pipeline is operating in 

7 the programmable mode; and 

8 (c) means for performing standard operations on the pixel data in order to 

9 generate output in accordance with a standard graphics application program 



BNSDOCfO: <WO OP103633A1 I > 



WO 02/103633 



53 



PCT/US02/19504 



10 interface if it is determined that the graphics pipeline is not operating in the 

1 1 programmable mode. 

1 29. The computer program product as recited in claim 29, wherein the standard 

2 graphics application program interface includes OpenGL® 

1 30. A method for programmable pixel processing in a computer graphics 

2 pipeline, comprising: 

3 (a) detennining whether the graphics pipeline is operating in a programmable 

4 mode; 

5 (b) performing programmable operations on pixel data in order to generate 

6 output if it is determined that the graphics pipeline is operating in the 

7 programmable mode; and 

8 (c) performing standard operations on the pixel data in order to generate output 

9 in accordance with a standard graphics application program interface if it is 

1 0 determined that the graphics pipeline is not operating in the programmable 

1 1 mode; 

12 (d) wherein the programmable operations are selected from the group consisting 

13 of a no operation, texture fetch, move, derivative, multiply, addition, 

14 multiply and addition, reciprocal, reciprocal square root, three component dot 

15 product, four component dot product, distance vector, minimum, maximum, 

16 pack, unpack, set on less than, set on greater or equal than, floor, fraction, 

17 kill pixel, exponential base two (2), logarithm base two (2), and light 

18 coefficients 

1 31. A method for programmable processing in a computer graphics 

2 pipeline, comprising: 

3 (a) receiving pixel data including texture information; and 

4 (b) performing programmable operations on the pixel data in order to generate 

5 output, wherein the operations are programmable by a user utilizing 

6 instructions from a predetermined instruction set; 
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(c) wherein the operations include a mathematical operation for altering the 
texture information of the pixel data. 

7 32. A method for programmable processing in a computer graphics pipeline, 

S comprising: 

9 (a) providing pixel data including texture information; and 

10 (b) performing programmable operations on the pixel data in order to generate 

1 1 output, wherein the operations are programmable by a user utilizing 

12 instructions from a predetermined instruction set; 

(c) wherein the operations include a mathematical operation for altering the 
texture information of the pixel data. 

13 33. A method for programmable processing in a computer graphics pipeline, 

14 comprising: 

15 (a) receiving pixel data including color information; and 

16 (b) performing programmable operations on the pixel data in order to generate 

1 7 output, wherein the operations are programmable by a user utilizing 

1 8 instructions from a predetermined instruction set; 

(c) wherein the operations include a mathematical operation for altering the 
color information of the pixel data. 

19 34. A method for programmable processing in a computer graphics pipeline, 

20 comprising: 

21 (a) receiving pixel data including texture information and color information; and 
(b) performing programmable operations on the pixel data in order to generate 

output, wherein the operations are programmable by a user utilizing 
instructions from a single instruction set. 
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